Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IAMF decoder #3709

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Implement IAMF decoder #3709

wants to merge 15 commits into from

Conversation

osagie98
Copy link
Contributor

@osagie98 osagie98 commented Jun 27, 2024

Implements a libiamf-based audio decoder.

Decoder specific code is guarded behind a GN flag enable_iamf_decode, which controls the ENABLE_IAMF_DECODE macro. enable_iamf_decode is off by default, so the new code doesn't build.

The decoder is implemented for Linux and Android TV.

b/341792042

starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved
std::optional<uint32_t> mix_presentation_id_;

bool has_valid_config_ = false;
bool binaural_mix_presentation_id_ = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "has_valid_config_" and "binaural_mix_presentation_id_" used?


int bytes_read = 0;
bool error = true;
while (bytes_read < 128 && buf[bytes_read] != '\0') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying all chars together should have better performance. We can find '\0' first and then copy the needed chars.


uint32_t codec_id = 0;
std::memcpy(&codec_id, &buf[buffer_head_], sizeof(uint32_t));
// Mp4 is in big-endian
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refine the comment here. Shall we always swap the bytes of codec id? And do we need to swap the bytes of any other field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to Kaido offline, and confirmed that Cobalt only supports little endian. As long as this code won't be upstreamed, it's ok to assume that the platform is little endian. However, please comment and DCHECK() somewhere (say in the ctor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a preprocessor check in the ctor

starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
if (kEnableSurroundAudio) {
SbMediaAudioConfiguration out_config;
SbMediaGetAudioConfiguration(0, &out_config);
int channels = std::max(out_config.number_of_channels, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we cap the channels to 2 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only executes if we want to force surround sound - it's 2 channels by default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.

But let's consult with the h5 player team to see their preference.

SB_DLOG(INFO) << "Defaulting to stereo output.";
}

error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SbMediaGetAudioConfiguration() will return the audio output device capabilities. I don't think we should set the output sound system based on the audio output devices directly. For example, if the input is stereo and we have 5.1 audio output system, we'll set output sound system to 5.1 for stereo input here. Is that expected? And what will happen if we do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IAMF decoder will upmix the stereo signal to 5.1, though that may not be desirable. I think we'll need some way for the player to signal to Cobalt if it wants 5.1 or stereo, without changing the audio stream.

For demo purposes I'll need a separate build that forces an SbPlayer to be created with 6 channels, since we can't tell based on the stream if the player expects 6 channels or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/364414955

starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/android/shared/media_is_audio_supported.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.h Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_config_reader.h Outdated Show resolved Hide resolved

// Decodes an Leb128 value and stores it in |value|. Returns the number of bytes
// read. Returns -1 on error.
int ReadLeb128Value(const uint8_t* buf, uint32_t* value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have unit tests for this, feel free to implement in a pending PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

case kObuTypeCodecConfig: {
sample_rate_ = 0;
uint32_t codec_config_id;
bytes_read = ReadLeb128Value(&buf[buffer_head_], &codec_config_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to double check if there is logic here to ensure that there are enough bytes for ReadLeb128Value() to read, instead of reading out of bounds, as we don't pass size remaining to ReadLeb128Value().

}
}

SB_CHECK(completed_parsing);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash in production, just want to double check if this is intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completed_parsing should be set to true in ReadOBU() line 229, after the reader finishes parsing the Descriptor OBUs and begins to read the encoded audio data. Though this variable is no longer needed.

const uint8_t* buf = input_buffer->data();
SB_DCHECK(buf);

bool completed_parsing = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be indicating that the OBU is the last one, consider naming it something like "is_last_obu", etc..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this variable.

return false;
}

int next_obu_pos = buffer_head_ + obu_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that buf as a parameter and buffer_head_ as a member variable are used together for the parsing head here. Consider refactoring, either by passing buf and advancing buf inside ReadOBU() and ReadOBUHeader(), or passing buffer_head as parameter to these functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've passed the buffer position as a parameter.

starboard/shared/libiamf/iamf_config_reader.h Outdated Show resolved Hide resolved

bool ResetAndRead(scoped_refptr<InputBuffer> input_buffer);

void Reset();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


void Reset();

bool is_valid() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

starboard/shared/libiamf/iamf_audio_decoder.h Outdated Show resolved Hide resolved
uint32_t config_size() { return config_size_; }
// TODO: Allow for selection of multiple mix presentation IDs. Currently,
// only the first mix presentation parsed is selected.
bool has_mix_presentation_id() { return mix_presentation_id_.has_value(); }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/android/shared/media_is_audio_supported.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_config_reader.h Outdated Show resolved Hide resolved
std::copy(std::begin(input_buffers), std::end(input_buffers),
std::back_inserter(pending_audio_buffers_));
consumed_cb_ = consumed_cb;
DecodePendingBuffers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's try to make it more straightforward if that's not necessary for IAMF decoder.

SB_DLOG(INFO) << "Defaulting to stereo output.";
}

error = IAMF_decoder_output_layout_set_sound_system(decoder_, sound_system);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a bug and consult it with the h5 player team for whether and when they want IAMF decoder to upmix the audio.

starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
if (kEnableSurroundAudio) {
SbMediaAudioConfiguration out_config;
SbMediaGetAudioConfiguration(0, &out_config);
int channels = std::max(out_config.number_of_channels, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think audio output channels can be any number. We don't have to throw out an error if the decoder can't downmix/upmix the audio to match output channels. For example, if the audio output device supports up to 10 channels and the decoder can support up to 8 output channels, we can set the decoder output channels to 8.

But let's consult with the h5 player team to see their preference.


SB_DCHECK(samples_decoded <= reader.samples_per_buffer());

decoded_audio->ShrinkTo(audio_stream_info_.number_of_channels *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same size as at line 164, so it's redundant.

Does samples_decoded always equal to reader.samples_per_buffer()? If so, we can remove the shrinking code and add a SbDCheck() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always, during a playback of https://www.youtube.com/watch?v=wDy_YFUOfl4 the first buffer contains 648 samples while the rest contain 960

starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_config_reader.cc Outdated Show resolved Hide resolved
starboard/shared/libiamf/iamf_audio_decoder.cc Outdated Show resolved Hide resolved
SB_DCHECK(is_valid());

if (input_buffer->size() == 0) {
SB_LOG(ERROR) << "Empty input buffer written to IamfAudioDecoder";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the function returns false, we should report an error here.

}

IamfBufferParser::IamfBufferInfo info;
IamfBufferParser().ParseInputBuffer(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make ParseInputBuffer() static.

IamfBufferParser().ParseInputBuffer(
input_buffer, &info, kForceBinauralAudio,
kForce6ChannelAudio | kForce8ChannelAudio);
if (!info.is_valid()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseInputBuffer() returns a boolean to indicate if the input buffer can be parsed properly. We can use that returned value instead of a is_valid() function.

}
} else {
IAMF_SoundSystem sound_system = SOUND_SYSTEM_A;
if (kForce6ChannelAudio) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding 6 or 8 channels when using surround sound, can we set the output to match the channel count of user audio outputs?


// Decodes an Leb128 value and stores it in |value|. Returns the number of
// bytes read, capped to sizeof(uint32_t). Returns the number of bytes read,
// or -1 on error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the leb128 values used in iamf header always be 4 bytes long? The parsing algorithm here is different comparing to other leb128 parsing algorithm.

bool ParseInputBuffer(const scoped_refptr<InputBuffer>& input_buffer,
IamfBufferInfo* info,
const bool prefer_binaural_audio,
const bool prefer_surround_audio);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can |prefer_binaural_audio| and |prefer_surround_audio| both be true? If not, we should combine them into one boolean.

// https://aomediacodec.github.io/iamf/v1.0.0-errata.html#paramdefinition
bool SkipParamDefinition(BufferReader* reader) const;

std::unordered_set<uint32_t> binaural_audio_element_ids_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can store audio element ids as a local variablesin ParseInputBufferInternal() instead of a instance variable.

binaural_audio_element_ids_.insert(audio_element_id);
} else if (loudspeaker_layout > IA_CHANNEL_LAYOUT_STEREO &&
loudspeaker_layout < IA_CHANNEL_LAYOUT_COUNT) {
surround_audio_element_ids_.insert(audio_element_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use one set for both binaural and surround audio elements, and store both element id and layout info in the set, like std::unordered_set<std::pari<uint32_t, uint8_t>>.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants